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1 Introduction 


Object recognition is one of the most ubiquitous problems in computer vision, arising in a wide 
range of applications and tasks. As a consequence, considerable effort has been expended in 
tackling variations of the problem, with a particular emphasis recently on model-based meth¬ 
ods. In the standard model-based approach, a set of stored geometric models are compared 
against geometric features that have been extracted from an image of a scene (cf. [4, 11]). 
Comparing a model with a set of image features generally involves finding a valid correspon¬ 
dence between a subset of the model features and a subset of the image features. For a 
correspondence to be valid, it is usually required that there exist some transformation of a 
given type mapping each model feature (roughly) onto its corresponding image feature. This 
transformation generally specifies the pose of the object - its position and orientation with 
respect to the image coordinate system. The goal thus is to deduce the existence of a legal 
transformation from model to image and to measure its ‘quality’. In other words, the goal is 
to determine whether there is an instance of the transformed object model in the scene, and 
the extent of the model present in the data. 

More formally, let {Ejj 1 < i < to} be a set of model features measured in a coordinate 
frame A4, let {/;|1 < i < s} be a set of sensory features measured in a coordinate frame S , 
and let T : Ai S denote a legal transformation from the model coordinate frame to the 
sensor coordinate frame. The goal is to identify a correspondence, / C 2 mXs , that pairs model 
features with sensor features. Each such correspondence / specifies some transformation 7} 
which maps each model feature close to its corresponding image feature. 2 That is 

/ = {(to 8 ,s j )|p(T/to 8 ,s j ) < e}, 

where p is some appropriate measure (e.g. Euclidean distance in the case of point features, or 
maximum Euclidean separation in the case of line features) and e is some bound on uncertainty. 
In general the quality of such an interpretation is measured in terms of the number of pairs 
of model and image features, or the cardinality of /, \I\. The goal of recognition is generally 
either to find the best interpretation, maximizing |/|, or all interpretations where |/| > t for 
some threshold t. 

Many of the approaches to the recognition problem can be distinguished by the manner in 
which they search for solutions. One class of methods focuses on Ending the correspondence 
/, typically by searching a potentially exponential sized space of pairings of model and data 

2 A given interpretation I will in fact generally define a range of ‘equivalent’ transformations in the sense 
that there are a number of transformations that generate the same set I. 
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features (e.g. [5, 7, 10, 11]). A second class of methods focuses on finding the pose T, typically 
by searching a potentially infinite resolution space of possible transformations, (e.g. [2, 8, 9, 18, 
19, 20, 21, 22]). A third class of methods is a hybrid of the other two, in that correspondences 
of a small number of features are used to explicitly transform a model into image coordinates 
(e.g., [1, 3, 6, 16, 17]) guiding further search for correspondences. 

We are primarily interested in methods in the second and third classes, because they 
compute transformations from model coordinate frame to image coordinate frame using a 
small number of feature pairs. When the sensor data can be measured exactly, the fact that 
a small number of features are used to compute a pose does not cause problems. For real 
vision systems, however, there is generally uncertainty in measuring the locations of data 
features, and resulting uncertainty in the estimated pose. In this paper we develop methods 
for bounding the degree of uncertainty in a three-dimensional transformation computed from 
a small number of pairs of model and image points. The specific pose estimation method that 
we investigate is that of [16, 17], however the results are very similar for a number of other 
methods (e.g., [23, 24]). This pose estimation method uses the correspondence of 3 model 
and image features to compute the three-dimensional position and orientation of a model with 
respect to an image, under a ‘weak perspective’ imaging model (orthographic projection plus 
scaling). 

The central idea of most pose-based recognition methods (such as alignment, geometric 
hashing, generalized Hough transform) is to use a small number of corresponding model and 
image features to estimate the pose of an object acting under some kind of transformation. 
The methods then differ in terms of how the computed pose is used to identify possible 
interpretations of the model in the image. The pose clustering and pose hashing methods 
compute all (or many) possible transformations, and then search the transformation space for 
clusters of similar transformations. In contrast, the alignment approaches explicitly transform 
the model into the image coordinate frame. The effects of pose errors in these two cases will 
be different, and we analyze the two cases separately. 

Implementation and testing of pose-based recognition methods has been reported in the 
literature, with good results. An open issue, however, is the sensitivity of such methods to noise 
in the sensory data. This includes both the range of uncertainty associated with a computed 
transformation, and the effect of this uncertainty on the range of possible positions for other 
aligned model features. The answers to these questions can be used to address other issues, such 
as analyzing the probability of false positive responses, as well as using this analysis to build 
accurate verification algorithms. In addressing these issues, we first derive expressions for the 
degree of uncertainty in computing the pose, given bounds on the degree of sensing uncertainty. 
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Then we apply these results to analyze pose clustering methods, and alignment methods. The 
big difference between these methods is that the former methods operate explicitly in the 
space of possible poses (or transformations), whereas the latter methods operate in the space 
of image measurements. 

Previous Results 

Some earlier work has been done on simpler versions of these questions, such as recognition that 
is restricted to 2D objects that can rotate and translate in the image plane [12] or recognition 
that is restricted to 2D objects that can be arbitrarily rotated and translated in 3D, then 
projected into the image plane [14]. While these are useful for analyzing particular cases 
of recognition, we are interested in extending these results to the full case of a non-planar 
3D object undergoing arbitrary rigid motion (under a ‘weak perspective’ imaging model of 
orthographic projection plus scaling). 

2 Computing 3D Pose from 2D data 

The pose estimation technique that we evaluate in detail is that of [16, 17], though similar re¬ 
sults hold for other techniques that use a small number of points to estimate three-dimensional 
pose. This method of computing the pose operates as follows: We are given three image points 
and three model points, each measured in their own coordinate system; the result is the trans¬ 
lation, scale and three-dimensional rotation that position the three model points in space such 
that they map onto the image points under orthographic projection. The original specification 
of this method assumes exact measurement of the image points is possible. In contrast, our 
development here assumes that each image measurement is only known to within some uncer¬ 
tainty disc of a given radius, e. We speak of the nominal measured image points, which are the 
centers of these discs. The measured points can be used to compute an exact transformation, 
and then we are concerned with the variations in this transformation as the locations of the 
image points vary within their respective discs. 

Let one of the measured image points be designated the origin of the points, represented by 
the vector o, measured in the image coordinate system. Let the relative vectors from this point 
to the other two points be rri and n, also measured in image coordinates. Similarly, let 0,M 
and N denote the three model vectors corresponding to o, m and n, measured in a coordinate 
system centered on the model. For convenience, we assume that the model coordinate origin 
is in fact at 0 (see Figure la). We also assume that the model can be reoriented so that M 
and N lie in a plane parallel to the image plane. Note that we use the notation x for general 
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Figure 1: Computing the pose. Part a: A pair of basis vectors in the image have been selected, as well as 
a pair of model basis vectors. The origin of the model basis is assumed to project to the origin of the image 
coordinate system. Part b: After the translation, the origin point of the selected model basis projects to the 
origin of the selected image basis. Part c: After the first rotation, the first model axis projects along the first 
image axis. Part d: After the next two rotations, both model axes project along their corresponding image 
axis, such that the ratios of the lengths of the axes are the same in the projected model and the image. 


vectors, and x for unit vectors. Also note that we assume that the optic axis is along z. 

Our version of the pose estimation algorithm is summarized below. The method described 
in [16] is similar, but for 2D objects. The method in [17], for 3D objects, is more direct and 
appears to be numerically more stable. The method used here, however, more readily lends 
itself to error analysis of the type desired (and the two methods are equivalent except for 
numerical stability issues). In particular, the method used here allows us to isolate each of the 
six transformation parameters into individual steps of the computation. 

For the exact transformation specified by the measured sensory data, the steps are: 

1. Translate the model so that the origins align. A point P is then transformed to P' by: 


P' = P + o- IIcO 

where lie- denotes projection along the z axis (see Figure lb). 

2. Rotate the model by an angle (/’ about the axis parallel to z and emanating from O' so 
that II zM lies along m, leading to the transformation 
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where Rq ^ denotes a rotation of angle ip about the unit axis £ (see Figure lc). 

3. Rotate the model by an angle 0 about the new M ", leading to the transformation 


pH! _ n , pH 

~ 11 M",6 ' 


4. Rotate the model by an angle <p about to 1 = f z X to (Figure Id), leading to 


pint _ n pm 


5. Scale by s so that 

sIKM"" = to. 

Z 

The constraints on the process are that N"" should project along n with scaled length 
sN = n, and M"" should project along to with scaled length sM = to . 

Now suppose that we don’t know the image points exactly, but only to within a disc of 
radius e. We want to know the effect of this on the computed transformation, i.e. what is 
the range of uncertainty in each of the transformation parameters if each of the image points 
is allowed to vary over an e— disc? We divide this analysis as follows. First we consider the 
transformation that aligns the model origin and the two model vectors with the image origin 
and the two image vectors. The translation is explicit in this computation, and we note that 
its error is simply bounded by the image uncertainty, e. We then derive expressions for the 
remaining transformation parameters, ip, 0 , <p and s, which are only implicit in the alignment 
of the model vectors with the image vectors. Given these expressions we are then able to 
characterize the effects of sensor uncertainty on these parameters. 

3 Aligning the Basis Vectors 

First we note that the translation which brings the model origin into correspondence with the 
image origin, as specified in Step 1 of the method, simply has an uncertainty of e, the sensor 
uncertainty. 

We have some freedom in choosing the model coordinate system, and in particular, we 
choose the coordinate frame such that both M and N are orthogonal to the optic axis z. In 
this case, given the measured image data, the angle ip is: 

, tfij = (^M , to^> = cos ip 

(ll 7 M X z, to) = — , to 1 ^) = sin ip. 
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( 1 ) 

(2) 



Because there will be some uncertainty in computing ip, due to the uncertainty e in the 
image points, the first rotation, using Rodrigues’ formula, transforms a vector into 

P" = Rz,ip+6ipP' = cos (ip + Sip)P' + (1 - COS (ip + 8 ip)) (P', z^z + sin(V> + 8 ip) (z X P') . (3) 

By this we mean that ip denotes the nominal correct rotation (i.e. the rotation that correctly 
aligns the model with the measured image data, without the uncertainty bounds) and Sip 
denotes the deviation in angle that could result from the e-bounded uncertainty in measuring 
the position of the image point. 

If we use the small angle approximation for Sip, by assuming \8ip\ <C 1, then we have 


P" 




SipR 


'z,ip+- 


n vP'. 


For the special case of P' = M , we have 


( 4 ) 


M" * R 7 jM + 6iPR^ + „M 
~ m + Sipfh' L 


( 5 ) 


Note that the right hand side is not a unit vector, but to a first order approximation, the 
expression is sufficient, since we have assumed that \8ip\ <C 1. Also note that this assumption is 
reasonable, so long as we place lower limits on the length of an acceptable image basis vector, 
i.e. we ensure that the length of the vector separating two basis points in the image is much 
greater than e. 

The second rotation has two components of uncertainty: 


P ~ Rm + SiPm-L^+SeP ■ ( 6 ) 

We could expand this out using Rodrigues’ formula, and keep only first order terms in Sip and 
89, under a small angle assumption. Unfortunately, we have found experimentally that while 
we can safely assume that Sip is small, we cannot make the same assumptions about 8<p or SO. 
Intuitively this makes sense, because the remaining two rotations <p and 0 are the slant and 
tilt of the model with respect to the image, and small changes in the image may cause large 
changes in these rotations. Thus, we keep the full trigonometric expressions for Sip and SO: 

P'" ~ RSn,e+seP" + W - cos (0 + SO)) [(p", m 1 ) m + (p", m) m L ] 

+Sip sin(# + SO) (m 1 ~ X P"') (7) 


By a similar reasoning, the third rotation gives: 

plltl _ /) pill 

r — ^m 1 - ,4>+6<t> r 


( 8 ) 
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Now suppose we decompose a point P' into the natural coordinate system: 

P' = am + (3m 1 ~ + 7 z. 

Then this point is transformed to 

P"" = am"" + (im L "" + 7 z"". 

Thus, to see the representation for the transformed point, we simply need to trace through 
the transformations associated with each of the basis vectors. 

Using equations (4), (7) and ( 8 ), we find that rotating the basis vectors, then projecting 
the result into the image plane (i.e. along z) yields (keeping only first order terms in 8 ip) 

Tl^m,"" = [cos(P + 8(p) {cos ip — Sip sin ip cos (0 + SO )} + sin ip sin(# + SO) sin(<^> + 8<p)\ m 

+ [8 ip cos ip + sin ip cos(0 + S0)\ m^. (9) 

H^m 1 "" = [cos(P + 8<p) {— sin ip — Sip cos ip cos (0 + SO)) + cos ip sin (0 + 80) sin(^> + 8<p)\ m 
+ [—Sip sin ip + cos ip cos(0 + SO)] m L . (10) 

= [8 ip cos(<p + 8<p) sin(# + SO) + cos(0 + SO) sin(<p + 8<p)\ m — sin(0 + S0)fh L . (11) 

Thus far we have seen how to align the basis vectors of the model with the basis vectors 
measured in the image, by a combination of two-dimensional translation and three-dimensional 
rotation. Before we can analyze the effects of uncertainty on the rotation and scale parameters, 
we need to derive expressions for them. 

4 Computing the Implicit Parameters 

Now we consider how to compute the remaining parameters of the transformation, and char¬ 
acterize the effects of uncertainty on these parameters. There are some special cases of the 
transformation that are of particular importance to us. First, consider the case of P' = iff. 
We have chosen the model coordinate system so that in this case 

a = iff cos ip [3 = —iff sin ip 7 = 0 


and thus 

sTL^M"" = sM cos (<p + S(p)fh + sMSipm 1 , (12) 

where iff is the length of the vector iff. 
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If we first consider the transformation that aligns the model with the measured image 
points (not accounting for the uncertainty in the image measurements), we want the scaled 
result to align properly, i.e. 

sJ[~M""\s < f ) =o = mm. 


This simply requires that 


m 

s = — sec 6. 
M Y 


(13) 


Note that we only want positive scale factors, so we can assume that sec cf) > 0, or that 

7T , 7T 

- <(/><-. 

2 ~ 2 

The error vector associated with the transformed M is then 


Em 


m sec (f)8 , if)m 1 ~ + m [cos 8(f) — tan cf) sin 8(f) — 1] 


(14) 


We can use this result to put constraints on the range of feasible transformations, and then 
on the range of feasible positions for other aligned points. 

Since there is an e-disc of error associated with the two endpoints of to, there will be a 
range of acceptable projections. One way to constrain this range is to note that the magnitude 
of the error vector should be no more than 2e. (This is because the translation to align origins 
has an uncertainty disc of e and the actual position of the endpoint also has an uncertainty 
disc of e.) This then imphes that 


2 


Em 


< 4e 2 


or, with substitutions, that 

[cos 8(f> — tan (f> sin 8(f> — l] 2 + (1 + tan 2 (f>){8%;) 2 < 4 ^^ . (15) 

We could also derive slightly weaker constraints, by requiring that the components of the 
error vectors e in each of the directions fh and to 1 be of magnitude at most 2e. In this case, 
we are effectively using a bounding box rather than a bounding disc of uncertainty. This leads 
to the following constraints: 

l^l < 

|cos 8(f) — 1 — tan (f) sin 8(f)\ < 

We need to do the same thing with N and n. As noted earlier, we can choose the model 
coordinate system so that N has no £ component. This means we can represent N by 


2e 

m 

2e 


COS (f) | 


m 


(16) 

(17) 


N 1 = N cos (fM + N >in 1 M ' 
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where £ is a known angle. Similarly, we can represent the corresponding image vector by 


n = n cos coin + n sin tom 


_L 


where lo is a measurable angle. This means that the nominal (i.e. no error) projected trans¬ 
formation of N is: 


TTIj N Tfl N 

r cos £ cos <f> + sin £ sin <f> sin 9\m -|-- — [sin £ cos 9] rri _L . 


cos (f) M 

But in principle this is also equal to 


cos (f) M 


(18) 


n = n cos coin + n sin tom 


_L 


and by equating terms we have 


m N . . 

= COS (p COS LO 

(19) 

-[cos £ cos <p + sin £ sin <p sin 9 \ 

n M 

rriN 

= cos £> sin ax 

(20) 

-sin £ cos 9\ 

n M l s J 


These two equations define the set of solutions for the transformation parameters £> and 9. 
(Note that the set of solutions for ip is given by equations (1) and (2).) There are several ways 
of solving these implicit equations in the variables of interest, namely <p and 9. One way is 
given below. First, let 

m N 

V = ~n~M' 

We can rewrite these equations as explicit solutions for 9\ 


cos 9 
sin 9 


sin lo cos <p 
r] sin £ 

cos <p (cos U) — tj cos £) 
r] sin £ sin <p 


( 21 ) 


This gives us a solution for 9 as a function of <p. To isolate cp, we use the fact that sin 2 #+cos 2 9 = 
1, and this leads to the following equation: 


sin 2 l> cos 4 p — i] 2 + 1 — 2i] cos d cos £ cos 2 <p + if sin 2 £ = 0. 


( 22 ) 


This is a quadratic in cos 2 cp, as a function of known quantities, and hence the two solutions 
will yield a total of up to four solutions for cos <p. But since we want s > 0, we know that 
cos cp > 0, and so at most two of these solutions are acceptable: 


cos cp = 


\ 


1 — 2 r) cos lo cos £ + T] 2 ± \J~{ T^^TT^os^Jcos^-MT^^l^T^bn^Jsb?^ 


2 sin 2 oj 


(23) 
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Note that this gives real solutions only if 


coswcos£< —--. (24) 

Also note that we need cos <p < 1 so that if this does not hold in the equation, we can exclude 
the cases as being impossible. If there are two real solutions for <p, they can be used with 
equation (21) to find solutions for 9. Note that solutions to equation (22) will be stable only 
when sinw is not small. This makes sense, since the unstable cases correspond to image bases 
in which the basis vectors are nearly (anti-)parallel. 

The complete version of transforming N is given by: 

sN [cos £ cos(<f> + 6<p) + sin £ {sin(# + 60) sm(<f> + 6(f) ) — Sip cos (9 + 69) cos (<f> + <*><(>)}] fh 
+ sN [cos (f6ip + sin £ cos(9 + 69)] fh' L . (25) 

Similar to our analysis of the error associated with the transformed version of M , we can 
set the magnitude of the difference between this vector and the nominal vector to be less than 
2e, or we can take the weaker constraints of requiring that the components of the error vector 
in two orthogonal directions each have magnitude at most 2e. One natural choice of directions 
is n and n 1 - but a more convenient, and equally feasible, choice is fh and m 1 . 

In the latter case, bounding the component of the error in the direction of fh 1 - yields 

TV 7i 26 

— sec cj) [6(f) cos £ + sin £ cos(9 + 69)] -sinw < —. (26) 

M mm 

The nominal transformation 

To summarize, we compute the nominal transformation, which is when the nominal image 
points are correct, as follows: 

1. Choose the coordinate system of the model so that the origin lies at O, and so that M 
and N both lie in the z = 0 plane. 

2. Translate the model so that the origins align ( 11^ denotes projection along the £ axis): 

P' = P + O- IlyO. 

3. Rotate the model by angle ip about £ so that IlgM lies along fh. ip is given by: 


II = (M,m) = cosip 


H 7 M x fh, z) = — ( M , fh ) = sin ip. 
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4. Rotate the model by angle 9 about the newly resulting Iff, which in this case will be m. 

5. Rotate the model by angle <p about to 1 . The angles <p and 9 are found by solving 


COS <p = 


\ 


1 — 2 rj cos oj cos £ + i] 2 ± \J(1 -\- rj 2 — 2i] cos lo cos £) 2 — 4 i] 2 sin 2 lo sin 2 £ 


for <p, and then solving 


cos 9 = 
sin 9 = 


for 9 , where 


2 sin 2 Ld 


sin Ld cos <p 
r] sin£ 

COS<p(cOSLd — T]COSp) 
r] sin £ sin cj) 


m N 

11 = 


and where only solutions for which 0 < cos (f> < 1 are kept. 

6. Project into the image, and scale by 


s = 


m 

M 


■ sec 




5 Uncertainty in the Implicit Parameters 


Now we turn to the problem of bounding the range of uncertainty associated with the rotation 
and scale parameters of the transformation, given that there are e-bounded positional errors 
in the measurement of the image points. 

To bound the rotational uncertainties, we will start with equation (15): 

[cos 6<f) — tan cj) sin 6<f) — l] 2 + (1 + tan 2 ^(Sij)) 2 < 4 

From this, a straightforward bound on the uncertainty in ip is 

\8ip\ < — I cos <p\ = — cos <p. 
m m 

To solve for bounds on 8(p, we use the following analysis. Given a value for Sip, we have the 
implicit equation 

cos S(p — 1 — tan <p sin 8<p = fj, (27) 



where fj, can range over: 


'2e 


\m 


2 

— sec 2 <p (Sip) 2 < ji < 



sec 2 <p (Sip) 2 . 
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We can turn this into a quadratic equation in cos8<f) and into a quadratic equation in 
sin8(f>, leading to two solutions for each variable, and hence a total of four possible solutions. 
By enforcing agreement with the implicit equation, we can reduce this to a pair of solutions: 


sin8<f> = — (1 + fi) sin <f> cos <f> + a cos <f>y 1 — (1 + fi) 2 cos 2 <f> (28) 

cos 8<f> = (1 + /i) cos 2 (j) + a sin <f>yj 1 — (1 + fi ) 2 cos 2 p (29) 

where a = ±1. Note that in order for these equations to yield a real result for 8(f>, we need the 
argument to the square root to be positive. This yields additional constraints on the range of 
legitimate values for fi. 

These two equations implicitly define a range of solutions for 8(f>, as fi varies. Since /i / -1 
and cj) 7^ ±-| (since this case corresponds to m = 0), we can actually simplify this to: 


tan 8(f> = 


— tan (f) + <7y^ 1+At) 2 cos 2 ^ ~ 1 

1 + ^tanVI ^i^ ~ 1 


By substituting for the two values for a and by substituting for the limits on fi, we can 
obtain a range of values for 8(f>. In fact, we get two ranges, one for the case of a = 1 and one 
for the case of a = —1. Only one of these two ranges, in general, will contain 0, and since 
this must hold, we can exclude the second range. In fact, when fi = 0, tan 8(f> = 0, and if we 
substitute these values into equation (30), we find that 

f 1 if tan <f> > 0; 
ct = sgn(tan0) = •{ 

( — 1 if tan (f) < 0. 

Note that we can simplify the implicit expressions for 8(f>. If we let 


v = arccos [(1 + j-i) cos (f>] 


8(f) = av — (f>. (31) 

To solve for bounds on 89, we do a similar analysis, using equation (26). We have the 
implicit equation 

N 71 

— sec cj) [Sip cos £ + sin £ cos (9 + 89)] -sin u = fj, (32) 


where 


2e 2e 

-< n < — • 

m m 


We can write this equation in the form: 


cos 9 cos 89 — sin 9 sin 89 = a 
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and using exactly the same kind of analysis, we can solve for sin <5# and cos SO: 


cos SO = a cos 0 + a sin 0\J 1 — a 2 
sin <5# = —a sin 0 + a cos 0\J 1 — a 2 


(33) 

(34) 


where 


2e 

m 


a 

= 

±1 

a 


nM sinw 


mN sin £ 

COS <f)\ 

< Sip < 

26 i a\ 
— cos <p\ 

771 

2c 

m 

< n < 

2c 

m ’ 


cos p — 


Sp M cos p 


tan£ N sin £ 




and where fj, is further constrained to keep the solutions real. Again, by substituting for the 
two values for a, by substituting for the limits on fj,, and by substituting for the limits on Sip, 
we can obtain a range of values for SO. 

Similar to the 6<p case, we actually get two ranges, one for the case of a = 1 and one for 
the case of a = —1. Again, in general only one of these two ranges will span 0, and since this 
must hold, we can automatically exclude the second range. 

Also similar to the 6<p case, we can simplify the expression for SO. In particular, if we let 


then 


v = arccos a 


SO = bv — 0. 


(35) 


To bound the error in scale, we return to equation (12). If we let Ss denote the multiplicative 
uncertainty in computing the scale factor s, i.e. the actual computation of scale is sSs, then 
by equation (12), one inequality governing this uncertainty is 

I 2 


mSs cos (p + Sp) 

2 

' mSsSip' 

- — - 1 — m 

COS (j) 

+ 

cos p 


< 4e 2 . 


(36) 


Ss > 


We can expand this out and solve for limits on Ss, which are given by 

cos(<p + 6<p) cos (p — cos p\J (cos 2 (<p + 6<p) + S 2 ip) — S 2 ip 
cos 2 (<p + 6<p) + S 2 ip 
cos(<p + 6<p) cos p + cos p\J ^2 (cos 2 (()> + Sp) + S 2 ip) — S 2 ip 
~ cos 2 (<p + Sep) + S 2 ip ^ ^ 

Thus, given bounds on Sip, from which we can get a range of values for Sp, we can compute 
the range of values for Ss. 
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In sum, our task was to obtain reasonable bounds on the errors in the six parameters of 
the transformation, namely translation, Sip, SO, Sep and 8s. The translation is known up to an 
e-disc. For the rotations and scale, we used constraints that force M and N to project near 
the uncertainty regions surrounding m and n, respectively. Specifically, let Em be the error 
vector from rh to the transformed and projected iff, and similarly for E jv and n. We first used 
a constraint on the magnitude of Em in the direction of m 1 to get bounds on Sip. Then, using 
the bounds on Sip plus a more general constraint on the magnitude of Em, we obtained a range 
of values for Sep. Next, we used the bounds on Sip again plus a constraint on the magnitude 
of E n in the direction of m 1 to get a range of values for 66. Lastly, we bounded 6s using the 
bounds on Sip, 69, and the general constraint on the magnitude of Em- 

Summary of Bounds on Parameters 

To summarize, we have the following bounds on uncertainty in the transformation given e 
uncertainty in the image measurements. The translation uncertainty is simply bounded by e. 
The rotational uncertainty is: 


\Sip\ < — Icos (p\ , 
m 


and 


sinScp = — (1 + fi) sin ^>cos cp + sgn(tan <f>) cos <pyj 1 — (1 + fi) 2 cos 2 cp 

cos 8<p = (1 + fi) cos 2 cp + sgn(tan <p) sin <pyj 1 — (1 + fi) 2 cos 2 cp 

subject to the constraint that: 

~\j (m) - sec2< ^) 2 ^ ^ \/(39 - sec 2 ( W ) 2 , 


and 


where 


cos 66 = a cos 6 + a sin 6 \J l — a 2 

sin <5# = —a sin 6 + a cos 6 \J l — a 2 


2e | 

m 


a 

= 

±1 

a 


nM sinw 


mN sin £ 

COS <f)\ 

< Sip < 

26 i zAI 

— cos <p\ 

771 

2c 

m 

< n < 

2c 

m ’ 


Sip M cos cp 


tan£ N sin £ 


(38) 


(39) 


(40) 
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and where fj, is further constrained to keep the solutions real. 
The uncertainty in scale is constrained by 


bs > 


bs < 


cos (p + bp) cos p — cos cb\J (cos 2 (^> + 6<f>) + b 2 ip) — b 2 ip 
cos 2 (^> + 6<f>) + b 2 ip 

cos(<f> + 6<f>) cos p + cos cb\J ^3- (cos 2 (^> + 6<f>) + 6 2 ip) — b 2 ip 


cos 2 (p + bp>) + b 2 ip ^ ^ 

We note that the bounds on the error for bO and bs are overly conservative, in that we 
have not used all the constraints available to us in bounding these errors. 


Constraints on the Analysis 

All of the bounds computed are overestimates, up to a few reasonable approximations and 
with the possible exception of the scale factor. To bring everything together, we now list and 
discuss the approximations and overestimates. 

In computing the formula for transforming and projecting a model point into the image, 
we assumed that \6ip\ <C 1, so that we could use the small angle approximation, which gave 
cos bp ~ 1 and sin^r/i ~ bip, and so that we could drop higher order terms m bip. 

Next, we list the sources of our overbounds on the parameters. First, we used a constraint 
that the error vector (Em) for projection of M has magnitude at most 2e. This is a weaker 
constraint than requiring the destination point of the transformed and projected M to be 
within the e-circle surrounding the image point at the destination point of m. 

The weak constraint on Em was used directly to bound both b<p and 6s, but an even weaker 
version was used to bound bip. The weaker version simply requires the magnitude of Em in the 
direction of m 1 to be at most 2e. Similarly, bO was bounded with a constraint that forces the 
magnitude of En in the direction of m 1 to be at most 2e. One indication that the constraint 
on E n is weak is that it is independent of bp, the rotation about m 1 . 

Further, it should be observed that another source of overbounding was the treatment of 
the constraints on Em and Tjv as independent. In actuality the constraints are coupled. 

Finally, there is one place where we did not clearly overbound the rotation errors, which 
are bip, bO, and bp. In computing their ranges of values, we used the nominal value of the scale 
factor, whose differences from the extreme values of the scale factor may not be insignificant. 


6 Using the Bounds 

The bounds on the uncertainty in the 3D pose of an object, computed from three corresponding 
model and image points, have a number of applications. They can be used to design careful 
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verification algorithms, they can be used to design error-sensitive voting schemes, and they 
can be used to analyze the expected performance of recognition methods. In this section, we 
consider all three such applications. 

6.1 3D Hough transforms 

We begin by considering the impact of the error analysis on pose clustering methods, such as 
the generalized Hough transform[2]. These methods seek to find solutions to the recognition 
problem by the following general technique: 

1. Consider a pairing of fc-tuples of model and image features, where k is the smallest such 
tuple that defines a complete transformation from model to image. 

2. For each such pair, determine the associated transformation. 

3. Use the parameters of the transformation to index into a hash space, and at the indexed 
point, increment a counter. This implies that the pairing of fc-tuples is voting for the 
transformation associated with that index. 

4. Repeat for all possible pairings of fc-tuples. 

5. Search the hash space for peaks in the stored votes, such peaks serving to hypothesize a 
pose of the object. 

While the generalized Hough transform is usually used for matching 2D images to 2D 
objects undergoing rigid transformations in the plane, or for matching 3D objects to 3D 
data, it has also been applied to recognizing 3D objects from 2D images (e.g. [24, 25, 15]). 
In this case, each dimension of the Hough space corresponds to one of the transformation 
parameters. Under a weak perspective imaging model these parameters (as we saw above) 
are two translations, three rotations, and a scale factor. The method summarized at the end 
of Section 4 provides one technique for determining the values of these parameters associated 
with a given triple of model and image points. 

In the case of perfect sensor data, the generalized Hough method generally results in 
correctly identified instances of an object model. With uncertainty in the data, however, in 
steps (2) and (3) one really needs to vote not just for the nominal transformation, but for the 
full range of transformations consistent with the pairing of a given fc-tuple (in this case triple) 
of model and image points. Our analysis provides a method for computing the range of values 
in the transformation space into which a vote should be cast: 
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1. Consider a pairing 3-tuples of model and image features. 

2. For each such pair, determine the associated transformation, using the method of Section 
4 to determine the nominal values of the transformation, and using equations (38) (39) 
(40) and (41) to determine the variation in each parameter. 

3. Use the parameters of the transformation, together with the range of variation in each 
parameter, to index into a hash space, and at the indexed point, increment a counter. 

4. Repeat for all possible pairings of 3-tuples. 

5. Search the hash space for peaks in the stored votes, such peaks serving to hypothesize a 
pose of the object. 

6.2 Effects of Error Sensitivity on the Hough Transform 

Unfortunately, allowing for uncertainty in the data dramatically increases the chances of false 
peaks in the vote in hash space, because each tuple of model and image points votes for a 
(possibly large) range of transformations. 

Earlier analysis of this effect [12] has shown that the sensitivity of the Hough transform as 
a tool for object recognition depends critically on its redundancy factor, defined as the fraction 
of the pose space into which a single data-model pairing casts a vote. This previous analysis 
was done for 2D objects and 2D images, and for 3D objects and 3D images. Here we examine 
the impact of this effect on using the Hough transform for recognizing 3D objects from 2D 
images. We use the analysis of the previous section to determine the average fraction of the 
parameter space specified by a triple of model and image points, given e-bounded uncertainty 
in the sensing data. (In [12], experimental data from [25] were used to analyze the behavior, 
whereas here we derive analytic values.) 

To do this, we simply find the expected range of values for Sip, as given by equation (16), 
for 8(f>, as given by equation (30), and for 89, as given by equations (33) and (34). We could 
do this by actually integrating these ranges for some distribution of parameters. An easier 
way of getting a sense of the method is to empirically sample these ranges. We have done this 
with the following experiment. We created a set of model features at random, then created 
sets of image features at random. We then selected matching triples of points from each set, 
and used them to compute a transformation, and the associated error bounds. For each of the 
rotational parameters, we measured the average range of variation predicted by our analysis. 
The positional uncertainty in the sensory data was set to be e = 1,3 or 5. The results are 
summarized in Table 1, where we report both the average range of uncertainty in angle (in 
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bip 

66 

bp 

Ss 

b r 

b t 

b 

Average 

.0503 

.2351 

.1348 

.1781 




Normalized 

.0080 

.0374 

.0429 

.1215 

1.284e -5 

1.257e -5 

1.961e -11 

Average 

.1441 

.5336 

.2708 

.3644 




Normalized 

.0229 

.0849 

.0862 

.2485 

1.676e -4 

1.131e -4 

4.710e -9 

Average 

.2134 

.7927 

.3524 

.4630 




Normalized 

.0340 

.1262 

.1122 

.3158 

4.814e -4 

3.142e -4 

4.777e -8 


Table 1: Ranges of uncertainty in the transformation parameters. Listed are the average range of uncertainty 
in each of the rotation parameters and the range of uncertainty in the multiplicative scale factor. The ranges 
are also normalized to the total possible range for each parameter (see text for details). Also indicated are b r , 
the redundancy in rotation, bt the redundancy in translation, assuming that the image dimension is D = 500, 
and b, the overall redundancy. Tables are for e = 1, 3 and 5 respectively. 


radians), and the average range normalized by 2i r in the case of 0 and ip and by 7 r in the case 
of <p (since it is restricted to the range —7 t/2 < <p < vr/ 2 ). 

The product of the three rotation terms, which we term b r , defines the average fraction of 
the rotational portion of the pose space that is consistent with a pairing of model and image 
3-tuples. 

To get the overall redundancy factor (the fraction of the transformation space that is 
consistent with a given pairing of model and sensor points), we must also account for the 
translational and scale parameters. If D is the size of each image dimension, then the fraction 
of the translational portion of pose space consistent with a given pairing of three model and 
image points is 

* D 2 ' 

In the examples reported in Table 1, we used a value of D = 500, where the largest possible 
distance between model features in the image was 176 pixels. 

To estimate the range of uncertainty in scale, we use the following method. Since the scale 
factor is a multiplicative one, we use log s as the ‘key’ to index into the hash space, so that 
when we account for uncertainty in scale, sbs is transformed into logs + log^s. If we assume 
that s max and s m i n denote the maximum and minimum allowed scale factors, then the fraction 
of the scale dimension covered by the computed uncertainty range is 

^ _ log <5s max log <5s m ; n 

log 

^max - log •‘"’min 

In the case of the experiments described in Table 1, we used s max = .13 and s m i n = .03. 
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e = 1 

3 

5 

Hough 

1.984e -15 

1.447e -12 

3.101e _n 

Estimate 

1.961e _n 

4.710e -9 

4.777e -8 


Table 2: Comparing fractions of the pose space consistent with a match of 3 image and model points. The 
Hough line indicates the average size of such regions for different amounts of sensor uncertainty. The Estimate 
line indicates the corresponding sizes using the uncertainty bounds on the transformation parameters derived 
in the previous section. 


The overall redundancy (fraction of the transformation space consistent with a given triple 
of model and image points) is 

b = b r b t b s . (42) 

Values for b are reported in Table 1. 

For this particular example, one can see that while the uncertainty in Sip is quite small, 
the uncertainty in the other two rotational parameters can be large, especially for large values 
of e. The redundancy in translation is a bit misleading, since it depends on the relative size of 
the object to the image. The uncertainty in scale, normalized to the total range of scale can 
in principle be quite large, though this also depends on the total range of possible values. 
Note how dramatically the redundancy jumps in going from e = 1 to e = 3. 

Evaluating the analysis 

We are overestimating the region of possible transformations, and one obvious question is 
how bad is this overestimate. We can explore this by the following alternative analysis. We 
are basically considering the following question: Given 3 model points and 3 matching image 
points, what is the fraction of the space of possible transformations that is consistent with 
this match, given e uncertainty in the sensed data? This is the same as asking the following 
question: For any pose, what is the probability that that pose applied to a set of 3 model points 
will bring them into agreement, modulo sensing uncertainty, with a set of 3 image points? If 
D is the linear dimension of the image (or the fraction of the image being considered), then 
this probability, under a uniform distribution assumption on image points, is 



because the probability of any of the transformed points matching an image point is just the 
probability that it falls within the e error disc, and by uniformity this is just the ratio of the 
area of that disc to the area of the image region. 
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The expression in equation (43) defines the best that we could do, if we were able to 
exactly identify the portion of pose space that is consistent with a match. To see how badly 
we are overestimating this region, we compare the results of Table 1 with those predicted by 
this model, as shown in Table 2. One can see from this table that our approximate method 
overestimates by about a factor of 1000. Considering this is distributed over a 6 dimensional 
space, this implies we are overestimating each parameter’s range by about a factor of 3. These 
numbers are potentially misleading since they depend on the relative size of the object to the 
image. Nonetheless, they give an informal sense of the difference between the ideal Hough case 
and the estimates obtained by this method. 

Using the analysis 

Although these values may seem like an extremely small fraction of the 6 dimensional space 
that is filled by any one vote for a pairing of 3-tuples, it is important to remember that under 
the Hough scheme, all possible pairings of 3-tuples cast votes into the space, and there are 
on the order of m 3 s 3 such tuples, where m is the number of known model features, and s is 
the number of measured sensory features. By this analysis, each such tuple will actually vote 
for a fraction b of the overall hash space. Even in the ideal limiting case of infinitesimal sized 
buckets in the transformation space, there is likely to be a significant probability of a false 
peak. 

To see this, we can apply the analysis of [12]. In particular, the probability that a point in 
the pose space will receive a vote of size j can be approximated by the geometric distribution, 

A J 

Pl ~ (1 + A)J+i 

where A = m 3 s 3 b. The probability of at least l votes at a point is then 

l ~ x f A \ l 

p>> = 1 - Eft = (rnO ' (451 

That is, this expression denotes the fraction of the cells in pose space that will have votes 
at least as large as i. In most recognition systems, it is common to set a threshold, /, on 
the minimum size correspondence that will be accepted as a correct solution. Thus we are 
concerned with the probability of a false positive, i.e. a set of at least t feature pairings 
accidentally masquerading as a correct solution. Suppose we set this threshold by assuming 
that a correct solution will have pairings of image features for t = fm of the model features, 
where / is some fraction, 0 < / < 1. Since we are using triples to define entries into the hash 
table, there will be (g) ~ x 3 votes cast at any point that is consistent with a transformation 
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/ = .25 

e = 1 

.5 

.75 

6 

/= .25 

= 3 

.5 

.75 

6 

/= .25 

= 5 

.5 

.75 

6 = 10 -2 

557 

1114 

1672 

90 

179 

269 

41 

83 

124 

10 -3 

487 

974 

1460 

78 

157 

235 

36 

72 

109 

10 -4 

442 

885 

1327 

71 

142 

213 

33 

66 

99 


Table 3: 


Approximate limits on the number of sensory features, such that the probability of a false positive of size fm 
is less than 8, shown for different fractions /, and different thresholds 8. The tables are for errors of e = 1,3 
and 5 respectively. 


aligning x of the model features with image features. Thus, if we want the probability of a 
false positive accounting for fm of the model features to be less than some bound 8, we need 


1 + A 


(fm ) 3 


< 6. 


(46) 


Substituting for A and rearranging the equation leads to the following bound on the number 
of sensory features that can be tolerated under these conditions: 


1 

s < — 
m 


1 1 
^ fi f 3 m 3 _ 


(47) 


Following the analysis of [12] we can use the series expansion 


= gMnay 

j=o 


J 


it 


together with a Taylor series expansion, to approximate this bound on s by: 


•Slim 


/ 


6 lull 1 


1 - 


ln i 

6 f 3 m 3 


f 


b In# 


(48) 


Note that to a first order approximation, the limit on the number of sensory features that can 
be tolerated, while keeping the probability of a false positive of size fm below some threshold 
8 is independent of the number of model features to, and only depends on the redundancy b 
(and hence the uncertainty e), the fraction / of the model to be matched, and the threshold 8. 
To get a sense of the range of values for s, we chart in Table 3 the limiting values for s based 
on equation (48) and using values for b from Table 1. 

One can see from this that, except in the case of very small sensor uncertainty, the Hough 
space very rapidly saturates, largely because of the m 3 s 3 number of cases that cast votes into 
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the space. As a consequence, the 3D-from-2D Hough transform will perform well only if some 
other process pre-selects a subset of the sensor features for consideration. Note that these 
numbers are based on the redundancy values associated with our derived approximation for 
the volume of Hough space associated with each pairing of model and image triples of features. 
As we saw, in the ideal case, the actual volume of Hough space is smaller, by a factor of about 
1000 (Table 2). This means that the values for the limits on sensor clutter in Table 3 in the 
ideal case will be larger by roughly a factor of 10. (Of course, this also requires that one 
can find an efficient way of exactly determining the volume of Hough space consistent with a 
pairing of image and model features.) For the larger uncertainty values, this still leaves fairly 
tight limits on the amount of sensory data that can be accomodated. This analysis supports 
our earlier work [12], in which we showed using empirical data, that the Hough transform 
works well only when there is limited noise and scene clutter. 

6.3 3D Alignment 

Alignment methods differ from pose clustering, or generalized Hough, methods in that the 
computed transformation is directly applied to the model, and used to check for additional 
corresponding model and image features. In order to analyze the effects of sensory uncertainty 
on this type of recognition method, we need to know what happens when a model point is 
transformed and projected into the image. That is, what is the range of positions, about the 
nominal correct position, to which a transformed model point can be mapped? Determining 
this allows us to design careful verification algorithms, in which minimal regions in the image 
are examined for supporting evidence. 

In this section, we describe how to use our analysis to bound the range of possible positions 
of a given model point that is projected into the image. In addition, we illustrate how the 
bounding regions we compute compare to the true regions of uncertainty. Lastly, we look at 
the implications of using our bounds to perform alignment-based recognition. In particular, 
we compute the probability of a false positive match, which is when model points transformed 
under an incorrect transform are aligned to random image points up to error. 

A Simple Verification Algorithm 

In the Alignment Method, pairs of 3 model and image points are used to hypothesize the pose 
of an object in the image. In other words, a method such as the one described in section 
4 is used to compute the transformation associated with this correspondence, and then that 
transformation is applied to all model features, thereby mapping them into the image. To 
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verify such an hypothesis, we need to determine which of the aligned features have a match in 
the image. While this should really be done on edge features, here we describe a method for 
verification from point features (the edge case is the subject of forthcoming work). The key is 
to approximate the smallest sized region in which to search for a match for each image, such 
that the correct match, if it exists, will not be missed. We can approximate such regions as 
follows: 

1. Decompose each model point P into the natural coordinate system, so that it transforms 
as: 

am + /3m 1 + 7 z ^ slly lam"" + /3m 1 "" + 7 ^""] • 

The transformation of the basis vectors is given by equations (9), (10) and (11). This 
allows us to determine the position of the nominally transformed model point. 

2. Select sample values for Sip, at some spacing, subject to the conditions of equation (38). 

3. For each value, use equations (39), (40) and (41) to compute bounds on the variation 
in the other error parameters. This leads to a set of extremal variations on each of the 
parameters. 

4. For each collection of error values in this set, perturb the nominal transformation param¬ 
eters, and compute a new position for the transformed model point. Take the difference 
from the nominal point to determine an error offset vector. 

5. Expand each error offset vector outward from the nominal point by an additional offset 
of 2e to account for the translational uncertainty and the inherent uncertainty in sensing 
the point. 

6. Add each error vector to the nominal point in the image, and take the convex hull of the 
result to get a good estimate of the range of feasible positions associated with a projected 
model point. 

7. Search over this region for a matching image point. 

8. If sufficiently many projected model points have a match, accept the hypothesized pose. 

An example of this is shown in Figure 2. The figure was created by taking a random set 
of 3D points as a model, arbitrarily rotating and translating them, projecting them into the 
image and scaling with a random scale factor between .05 and .1, and perturbing the result 
randomly with error vectors of magnitude at most e, resulting in a set of corresponding data 
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Figure 2: 

Two examples of uncertainty regions, with perturbation of the data. 

points. In this figure, the open circles represent a set of image points, each displayed as an 
e-disc. The process is to match 3 model points to 3 data points whose positions are known 
up to e-circles. Then, the match is used together with the parameters of the transformation 
to compute the uncertainty regions (displayed as polygons) and the crosses, which lie at the 
nominal location of the model. Note that the image points corresponding to the model points 
could fall anywhere within the polygons, so that simply searching an e-circle for a match would 
not be sufficient. 

One can see that the uncertainty regions vary considerably in size. To get a sense of this 
variation, we ran a series of trials as above, and collected statistics on the areas associated 
with each uncertainty region, over a large number of different trials of model and image points. 
We can histogram the areas of the observed discs, in terms of vr(2e) 2 (the size of the basic disc 
of uncertainty). A sample histogram, normalized to sum to 1, is shown in Figure 3, and was 
based on 10000 different predicted regions of uncertainty. For the case of e = 5, the expected 
area of an uncertainty region is 2165 square pixels. For the case of e = 3, the expected area of 
an uncertainty region is 1028 square pixels. For e = 1, the expected area is 195 square pixels. 
In all cases, the maximum separation of image features was m max =176. 

Using the analysis 

One advantage of knowing the bounds on uncertainty in computing a transform is that they 
can be used to overestimate the regions of the image into which aligned features project. This 
gives us a way of designing careful verification systems, in which we are guaranteed to find a 
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Figure 3: 

Graph of the distribution of areas of uncertainty regions, as measured in the image, for the case of e = 3. The 
horizontal axis is in units of 7r(2e) 2 . The vertical axis records the fraction of the distribution of uncertainty 
regions with that size. 

correct corresponding image feature, if it exists, while at the same time keeping the space over 
which to search for such features (and thus the number of false matches) relatively small. Of 
course, we know that our estimates err on the high side, and it is useful to see how much our 
projected image regions overestimate the range of possible positions for matches. 

To do this, we have run the following experiment. We took a 3D model of a telephone, 
and created an image of that model under some arbitrary viewing condition. We then chose a 
corresponding triple of image and model features and used the method described here both to 
determine the alignment transformation of the model and to determine our estimates of the 
associated uncertainty regions for each feature, based on assuming e-discs of uncertainty in the 
image features. For comparison, we took a sampling of points on the boundary of the e-disc 
around each of the basis images points, computed the associated alignment transformation, 
and projected each additional model features into the image. We collected the set of positions 
for each projected model point as we allowed the basis points to vary over their e-discs, and 
used this to create regions of uncertainty about each aligned point. This should be a very close 
approximation to the actual region of uncertainty. We compare these regions to our estimated 
regions in Figure 4. One can see that our method does overestimate the uncertainty regions, 
although not drastically. 

Finally, we can use our estimates of the uncertainty regions to estimate the probability 
that a random pairing of model and image bases will collect votes from other model points. 
That is, if we use a random alignment of the model and project the remaining transformed 
model points into the image, on average each such point will define an uncertainty region of 
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Example b 
Figure 4: 

Comparison of ideal uncertainty regions with estimated regions. Each feature in the model is projected according 
to the nominal transformation, as illustrated by the points. The dark circles show the ideal regions of uncertainty. 
The larger enclosing convex regions show the estimated uncertainty regions computed by our method. Two 
different solutions are shown. 


the size computed above. If we consider an image of dimension D = 500, then the selectivity 
of the method (i.e. the probability that each model point will find a potentially matching 
image point in its uncertainty region) is 0.000781, 0.00411 and 0.00866 for |' ; = 1,3 and 5 
respectively. By comparison, the selectivity for the case of a planar object in arbitrary 3D 
position and orientation, for the same level of sensor uncertainty is 0.000117,0.001052 and 
0.002911 respectively (Table 1 of [14]). Although they represent overestimates, these results 
suggest that the selectivity of recognition methods applied to 3D objects should be only slightly 
worse than when applied to 2D objects. 

To see this, we can use the analysis of [14] to estimate limits on the number of image 
features that can be tolerated, while maintaining a low false positive rate. Recapping from 
that earlier work, the false positive rate is computed by the following method: 

1. The selectivity of the method is defined by the probability that the uncertainty region 
associated with a projected model point contains an image point, and this is just the 
redundancy (fraction of the transformation space that is consistent with a given triple 
of model and image points) 6, as defined in equation (42). 

2. Since each model point is projected into the image, the probability that a given model 
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point matches at least one image point is 


p = 1 — (1 — b) s 3 


because the probability that a particular model point is not consistent with a particular 
image point is (1 — 6) and by independence, the probability that all s — 3 points are not 
consistent with this model point is (1 — b) s ~ 3 . 

3. The process is repeated for each model point, so the probability of exactly k of them 
having a match is 

^jp k (l-py m - 3 - k . (49) 

Further, the probability of a false positive identification of size at least k is 

k -1 

wk = i - 

i = 0 

Note that this is the probability of a false positive for a particular sensor basis and a 
particular model basis. 

4. This process can be repeated for all choices of model bases, so the probability of a false 
positive identification for a given sensor basis with respect to any model basis is 

e k = l-(l -w k p). (50) 

Thus, we can compute limits on s such that < S where S is some threshold on the false 
positive rate, and where k is taken to be fm for some fraction 0 < / < 1. In Table 4, we list 
these limits, computed using equation (50) and values of 6 obtained from the ratio of areas 
described in equation (42). 

While these results give a sense of the limits on alignment, they are potentially slightly 
misleading. What they say is that if we used the derived bounds to compute the uncertainty 
regions in which to search for possible matches, and we use no other information to evaluate a 
potential match, then the system saturates fairly quickly. As we know from Figure 4, however, 
our method overestimates the uncertainty regions, and a more correct method, such as that 
described earlier in which one uses sample of the basis uncertainty regions to trace out ideal 
uncertainty regions for other points, would lead to much smaller uncertainty regions, smaller 
values for the redundancy 6, and hence a more forgiving verification system. Also, one could 
clearly augment the test described here to incorporate additional constraints on the pose and 
its uncertainty that can be obtained by using additional matches of model and sensory features 


qk = 


m — 3 
, k 
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6 

/= .25 

= 1 

.5 

.75 

6 

/ = .25 

= 3 

.5 

.75 

6 

/ = .25 

= 5 

.5 

.75 

6 = 10 -2 

149 

480 

1069 

30 

93 

205 

16 

45 

98 

10 -3 

139 

457 

1028 

28 

89 

197 

15 

43 

95 

10 -4 

130 

437 

991 

27 

85 

190 

14 

42 

90 


Table 4: 


Approximate limits on the number of sensory features, such that the probability of a false positive of size fm 
is less than 8, shown for different fractions /, and different thresholds 8. The tables are for errors of e = 1,3 
and 5 respectively, and for m = 200. 


to further limit the associated uncertainty (e.g. given 4 pairs of matched points, use sets of 3 
to compute regions of uncertainty in pose space, intersect these regions and use the result to 
determine the poses uncertainty). 

7 Summary 

A number of object recognition systems compute the pose of a 3D object from using a small 
number of corresponding model and image points. When there is uncertainty in the sensor 
data, this can cause substantial errors in the computed pose. We have derived expressions 
bounding the extent of uncertainty in the pose, given e-bounded uncertainty in the mea¬ 
surement of image points. The particular pose estimation method that we analyzed is that 
of [12, 13], which determines the pose from 3 corresponding model and image points under a 
weak perspective imaging model. Similar analyses hold for other related methods of estimating 
pose. 

We then applied this analysis in order to analyze the effectiveness of two classes of recog¬ 
nition methods that use pose estimates computed in this manner: the generalized Hough 
transform and alignment. We found that in both cases, the methods have a substantial chance 
of making a false positive identification (claiming an object is present when it is not), for even 
moderate levels of sensor uncertainty (a few pixels). 
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